-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky unit tests #2904
Fix flaky unit tests #2904
Conversation
Hi @cblecker. Thanks for your PR. I'm waiting for a operator-framework member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @exdx @grokspawn |
Signed-off-by: Christoph Blecker <cblecker@redhat.com>
Signed-off-by: Christoph Blecker <cblecker@redhat.com>
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: ankitathomas, cblecker The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description of the change:
This fixes two flaky unit tests
TestResolver
andTestUpdates
.First,
TestResolver
, specifically theFailForwardEnabled/3EntryReplacementChain/ReplacementChainBroken/NotSatisfiable
test case.This test was changed in #2788 and currently references one of the three failed CSVs in the test namespace.
However, when you run this test multiple times, the resolver cache is sometimes handing back a different CSV (the following was a purposefully broken match in order to always print the test result):
Note how sometimes the resolver is returning
catsrc-namespace/a.v1
and sometimes it'scatsrc-namespace/a.v2
.As I understand this test, the specific CSV doesn't matter for the error as all three are in the
CSVPhaseFailed
state. Therefore, my proposed fix removes the CSV name from the error match, so that it will match on any of the three CSVs. I believe this was the original intent of the test, looking at what it was prior to #2788 (noting that the error matching was split in two, based on the same omission of the CSV name).Second,
TestUpdates
. In this piece of code:operator-lifecycle-manager/pkg/controller/operators/olm/operator_test.go
Lines 3925 to 3929 in 6ffec4d
The for loop will keep hammering the fake operator with
op.syncClusterServiceVersion(csv)
andGet
calls as fast as it possibly can. However, this is creating a race condition in the*RaceFreeFakeWatcher
that is watching the fakeOperatorGroup
. Basically the watch channel is filling up (default watch channel length is 100 events, and if it fills up and closes, the go routine panics) more quickly than the operator can drain it.The proposed fix here creates a sleep of 1ms between instances of this for loop. It only slows the test down negligibly, but it's enough to help the watch channel drain faster than it fills. In a real world Kubernetes API server, responses aren't going to be that fast anyways.
Motivation for the change:
Fix flaky unit tests, because they are the worst.
Architectural changes:
Testing remarks:
With fix, ran test 100 times to verify flake is gone:
This compared to the current HEAD, which this test fails somewhere between 10-20% of the time.
Reviewer Checklist
/doc
[FLAKE]
are truly flaky and have an issue